In this notebook, we'll explore how Netflix's content varies across countries and genres using a global dataset.
We'll also use interactive visualizations to see how genre diversity differs between countries.
import pandas as pd
import plotly.express as px
Let's import the dataset and take a quick look
df = pd.read_csv("netflix.csv")
df.head()
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 8807 entries, 0 to 8806 Data columns (total 12 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 show_id 8807 non-null object 1 type 8807 non-null object 2 title 8807 non-null object 3 director 6173 non-null object 4 cast 7982 non-null object 5 country 7976 non-null object 6 date_added 8797 non-null object 7 release_year 8807 non-null int64 8 rating 8803 non-null object 9 duration 8804 non-null object 10 listed_in 8807 non-null object 11 description 8807 non-null object dtypes: int64(1), object(11) memory usage: 825.8+ KB
This dataset includes thousands of Netflix titles with metadata like release year, country, content type, and genres. Previewing the data allows us to check for missing values and understand the structure. We notice several null entries, especially in columns like director, cast, and country, which we’ll need to address during cleaning.
We'll handle missing values and convert date columns to proper formats.
We'll remove rows missing key information and create a year_added column to help with time-based analysis.
df = df.dropna(subset=["country", "date_added", "listed_in"])
df["date_added"] = pd.to_datetime(df["date_added"])
df["year_added"] = df["date_added"].dt.year
df["country"] = df["country"].str.strip()
df["listed_in"] = df["listed_in"].str.strip()
Cleaning the dataset is critical for reliable analysis. We remove rows missing country, date_added, or listed_in since these are key to understanding content distribution, time trends, and genre variety. We also convert date_added to a datetime format and extract the year for time-based visualizations. Finally, trimming whitespace ensures values are consistent (e.g., no duplicate countries due to spacing).
Some titles are associated with more than one country.
We split these into separate rows so each title-country pair is counted individually.
df_exp = df.copy()
df_exp["country"] = df_exp["country"].str.split(", ")
df_exp = df_exp.explode("country")
Many titles are listed under multiple countries, especially in co-productions or shared distribution rights. By splitting these into individual rows, we avoid undercounting shared content and get a true sense of how geographically distributed Netflix’s catalog is. This transformation is essential for accurate country-level insights later on.
We’ll now find the countries that appear most often in the dataset.
top_countries = df_exp["country"].value_counts().nlargest(10).reset_index()
top_countries.columns = ["Country", "Count"]
top_countries
px.bar(top_countries, x='Country', y='Count', title='Top 10 Countries by Number of Titles')
This bar chart highlights the countries most represented in Netflix’s catalog. Unsurprisingly, the U.S. leads with a large margin due to its vast entertainment industry and Netflix’s home base. India, the UK, and South Korea follow — reflecting growing investments in Bollywood, K-dramas, and international licensing. These figures also suggest Netflix’s prioritization of certain markets.
Like countries, titles often belong to multiple genres.
We’ll separate them out so we can analyze genre frequency by country.
df_genre = df_exp.copy()
df_genre["genre"] = df_genre["listed_in"].str.split(", ")
df_genre = df_genre.explode("genre")
Just as with countries, many titles span multiple genres. A show might be labeled both “Drama” and “Romantic,” or a documentary might also fall under “Politics.” Exploding the genre column helps us treat each genre equally in our analysis, which is especially useful when analyzing preferences or diversity by country.
This chart shows the overall distribution of Movies and TV Shows on Netflix.
type_counts = df['type'].value_counts().reset_index()
type_counts.columns = ['Content Type', 'Count']
px.bar(type_counts, x='Content Type', y='Count', title='Distribution of Movies vs TV Shows')
This plot shows the breakdown between movies and TV shows in the catalog. While movies dominate in quantity, TV shows have been increasingly emphasized by Netflix due to their binge-worthy appeal and subscriber retention potential. Over time, Netflix’s investments in long-form series across various languages have also contributed to this shift.
We'll group our exploded data to count how many times each genre appears in each country. Then we filter to just the top 10 countries to keep the visualization clear.
genre_counts = df_genre.groupby(["country", "genre"]).size().reset_index(name="count")
top_genre_counts = genre_counts[genre_counts["country"].isin(top_countries["Country"])]
top_10 = df_exp['country'].value_counts().nlargest(10).index
filtered_genre_counts = genre_counts[genre_counts['country'].isin(top_10)]
# Plot grouped bar chart
fig = px.bar(filtered_genre_counts,
x='country',
y='count',
color='genre',
title='Genre Breakdown by Country',
labels={'count': 'Title Count'},
barmode='group')
fig.show()
Grouping genre counts by country lets us examine content tendencies. For example, romantic comedies may be more prevalent in India, while South Korea might skew toward drama and thrillers. These trends reflect regional audience tastes and production cultures, and help Netflix optimize content recommendations and investments per market.
Why it matters: Are certain genres more international than others? How we’ll do it:
Explode both country and listed_in columns
Bar chart: Top genres per country or vice versa
df_genre = df_exp.copy()
df_genre['genre'] = df_genre['listed_in'].str.split(', ')
df_genre = df_genre.explode('genre')
genre_counts = df_genre.groupby(['country', 'genre']).size().reset_index(name='count')
top = genre_counts[genre_counts['country'].isin(top_countries['Country'])]
px.sunburst(top, path=['country', 'genre'], values='count',
title='Genres by Country (Sunburst View)')
The sunburst visualization allows us to explore genre distribution hierarchically — starting from countries down to specific genres. Countries with wide genre variation like the U.S. and UK show Netflix’s full catalog spread, while more niche-producing countries might focus on a few categories. This helps evaluate how balanced or specialized a country’s Netflix offering is.
This line chart shows how Netflix has increased the number of titles from top countries year-over-year.
# Filter to top countries
df_years = df_exp[df_exp["country"].isin(top_countries["Country"])]
# Count titles added by year per country
country_growth = df_years.groupby(["year_added", "country"]).size().reset_index(name="count")
# Plot
px.line(country_growth, x="year_added", y="count", color="country",
title="Netflix Content Growth by Country Over Time",
labels={"year_added": "Year", "count": "Titles Added"})
This line chart illustrates how content from each country has grown on Netflix year by year. After around 2015, there’s a noticeable surge in titles from all countries, especially the U.S., India, and South Korea. This aligns with Netflix’s global expansion, where they invested in local productions to attract regional audiences while also making them available internationally.
Netflix’s catalog has grown significantly in both volume and global diversity. While the U.S. still dominates, countries like India and South Korea are contributing more each year. Genre trends reflect both global favorites and regional preferences. Overall, Netflix is clearly evolving into a more internationally balanced platform that serves a wide range of audiences.